A data mining approach to analysis and prediction of movie ratings

نویسنده

  • M. Saraee
چکیده

This paper details our analysis of the Internet Movie Database (IMDb), a free, user-maintained, online resource of production details for over 390,000 movies, television series and video games, which contains information such as title, genre, box-office taking, cast credits and user's ratings. We gather a series of interesting facts and relationships using a variety of data mining techniques. In particular, we concentrate on attributes relevant to the user ratings of movies, such as discovering if big-budget films are more popular than their low budget counterparts, if any relationship between movies produced during the "golden age" (i.e. Citizen Kane, It’s A Wonderful Life, etc.) can be proved, and whether any particular actors or actresses are likely to help a movie to succeed. The paper also reports on the techniques used, giving their implementation and usefulness. We have found that the IMDb is difficult to perform data mining upon, due to the format of the source data. We also found some interesting facts, such as the budget of a film is no indication of how well-rated it will be, there is a downward trend in the quality of films over time, and the director and actors/actresses involved in a film are the most important factors to its success or lack thereof. The data used in this paper is not freely distributable, but remains copyright to the Internet Movie Database inc. It is used here within the terms of their copying policy. Further distribution of the source data used in this paper may be prohibited.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of user's trustworthiness in web-based social networks via text mining

In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...

متن کامل

Who Rated What: a combination of SVD, correlation and frequent sequence mining

KDD Cup 2007 focuses on predicting aspects of movie rating behavior. We present our prediction method for Task 1 “Who Rated What in 2006” where the task is to predict which users rated which movies in 2006. We use the combination of the following predictors, listed in the order of their efficiency in the prediction: • The predicted number of ratings for each movie based on time series predictio...

متن کامل

Utilizing the Open Movie Database API for Predicting the Review Class of Movies

In this paper, we present our contribution to the Linked Data Mining Challenge 2015. Our approach predicts the review class of movies using external data from the Open Movie Database API (OMDb-API). We select specific features, such as movie ratings and box office, that are very likely to describe the quality of a movie. With RapidMiner we utilize these features and apply three basic classifica...

متن کامل

Customer Retention Based on the Number of Purchase: A Data Mining Approach

Purpose: this study wants to find any relationship between the numbers of purchase and the income the customer brings to the company. The attempt is to find those customers who buy more than one life insurance policy and represent the signs of good payments at the same time by the help of data mining tools. Design/ methodology/ approach: the approach of this research is to use data mining tools...

متن کامل

Modelling Customer Attraction Prediction in Customer Relation Management using Decision Tree: A Data Mining Approach

In Today’s quality- based competitive world, known as knowledge age, customer attraction is of ultimate importance. In respect to the slogan “customer is always right”, customer relation management is the core of an organizational strategy playing an important role in four aspects of customer identification, customer attraction, customer retaining, and customer satisfaction. Commercial organiza...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004